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Abstract 

Computing  the  optimal  expansion  of  a  signal  in  a  redundant  dictionary  of 
waveforms  is  an  NP-complete  problem.  We  introduce  a  greedy  algorithm 
called  a  matching  pursuit  which  computes  a  sub-optimal  expansion.  The 
dictionary  waveforms  which  best  match  a  signal’s  structures  are  chosen  iter¬ 
atively.  An  orthogonalized  version  of  the  matching  pursuit  is  also  developed. 
Matching  pursuits  are  general  procedures  for  computing  adaptive  signal  rep¬ 
resentations.  With  a  dictionary  of  Gabor  functions,  a  matching  pursuit 
defines  an  adaptive  time- frequency  transform.  We  derive  a  signal  energy 
distribution  in  the  time- frequency  plane  which  does  not  contain  interference 
terms,  unlike  the  Wigner  and  Cohen  class  distributions.  A  matching  pursuit 
is  a  chaotic  map  whose  asymptotic  properties  are  studied.  We  describe  an 
algorithm  which  isolates  the  coherent  structures  of  a  signal  and  show  an 
application  to  pattern  extraction  from  noisy  signals. 
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1  Introduction 


In  this  paper  we  focus  on  the  problem  of  approximating  functions  using 
linear  combinations  of  a  small  number  waveforms.  To  obtain  a  compact  ex¬ 
pansion  of  a  function  which  contains  complex  structures,  we  must  adapt  our 
expansion  to  the  various  components  of  the  function.  Examples  of  such  a  lin¬ 
ear  expansions  include  triangular  mesh  approximations  to  surfaces,  used  in 
scientific  computing  applications.  These  mesh  expansions  can  be  adapted  to 
obtain  low  approximation  error  with  a  small  number  of  triangles  by  varying 
the  size  and  shape  of  the  triangles  according  to  the  approximated  surface’s 
local  properties.  Adaptive  linear  expansions  can  be  used  to  extract  informa¬ 
tion  from  signals.  We  obtain  an  adaptive  time- frequency  decomposition  of  a 
signal  by  expanding  the  signal  into  a  sum  of  waveforms  whose  localizations 
in  time  and  frequency  match  those  of  the  different  signal  structures.  Such 
adaptive  time-frequency  representations  are  important  in  signal  processing 
applications  such  as  speech  analysis. 

The  waveforms  which  we  use  for  our  expansions  are  drawn  from  a  large 
and  redundant  collection,  called  a  dictionary.  In  section  2  we  examine  the 
computational  complexity  of  optimally  approximating  a  function  with  a 
linear  combination  of  vectors  from  a  dictionary.  We  prove  that  in  a  fi¬ 
nite  dimensional  space,  computing  the  optimal  solution  is  an  NP-complete 
problem,  which  motivates  the  use  of  sub-optimal  greedy  algorithms.  We 
introduce  the  matching  pursuit  algorithm,  a  greedy  algorithm  which  com¬ 
putes  function  expansions  by  iteratively  selecting  dictionary  vectors  which 
best  correlate  to  signal  structures.  An  orthogonal  version  of  the  matching 
pursuit  algorithm  is  also  described  and  compared  with  the  non-orthogonal 
algorithm.  An  application  of  matching  pursuits  to  finding  adaptive  time- 
frequency  decompositions  is  explained  in  section  6.  A  signal  is  decomposed 
into  waveforms  selected  from  a  dictionary  of  time- frequency  atoms,  a  collec¬ 
tion  of  dilations,  translations,  and  modulations  of  a  single  window  function. 
We  construct  a  time-frequency  energy  distribution  by  summing  the  Wigner 
distributions  of  the  selected  time-frequency  atoms.  Unlike  the  Wigner  dis¬ 
tribution  or  Cohen’s  class  distributions,  this  energy  distribution  does  not 
include  interference  terms  and  thus  provides  a  clear  picture  of  the  time- 
frequency  plane. 

Matching  pursuits  have  chaotic  properties  which  are  analyzed  for  par¬ 
ticular  dictionaries.  As  the  number  of  iterations  of  a  matching  pursuit 
increases,  the  approximation  error  converges  to  the  realization  of  a  noise 
process  whose  energy  is  uniformly  spread  across  all  the  dictionary  vectors. 
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From  this  asymptotic  behavior,  we  derive  an  algorithm  that  selects  coher¬ 
ent  signal  structures  from  noisy  signals.  We  describe  an  application  to  noise 
removal  in  speech  recordings. 


2  Optimal  Adaptive  Approximations  in  Dictio¬ 
naries 


We  expand  functions  from  a  Hilbert  space  H  into  linear  combinations  of 
vectors  from  a  large  collection  V  =  (07(;f))  p,  with  ||c/7||  =  1,  called 

a  dictionary.  The  dictionary  is  constructed  so  that  linear  combinations  of 
dictionary  vectors  are  dense  in  H.  The  smallest  possible  dictionary  is  a  basis 
of  H,  but  in  practice  a  dictionary  is  a  very  redundant  set.  The  redundancy 
gives  an  increased  degree  of  freedom  in  constructing  function  expansions, 
and  this  freedom  is  used  in  order  to  obtain  improved  convergence  properties 
of  the  expansions. 

For  signal  processing  applications,  we  study  the  properties  of  dictionar¬ 
ies  composed  of  waveforms  that  are  well  concentrated  both  in  time  and 
frequency.  Our  signal  space  is  L2(R)  and  we  construct  such  a  dictionary  by 
scaling,  translating  and  modulating  a  single  window  function  g{t)  6  L2(R). 
We  suppose  that  g(t )  is  real  and  centered  at  0.  We  also  impose  that  ||g||  =  1, 
that  the  integral  of  g(t)  is  non-zero  and  g( 0)  ^  0.  For  any  scale  s  >  0,  fre¬ 
quency  modulation  £  and  translation  a.  we  denote  7  =  (s,  u.  £)  and  define 

•>,(<)  =  (1) 


,  2  1 

The  index  7  is  an  element  of  the  set  T  =  R+xR  .  The  factor  normalizes 

V-s 

to  1  the  norm  of  g-y(t).  The  function  g1(t)  is  centered  at  the  abscissa  u  and 
its  energy  is  concentrated  in  a  neighborhood  of  u.  whose  size  is  proportional 
to  s.  Let  g(io )  be  the  Fourier  transform  of  g(t).  Equation  (1)  yields 


=  \/sg(s(u>  -  0)e  l{uJ  i)l 


(2) 


Since  |<7(w)]  is  even,  |g7(u;)|  is  centered  at  the  frequency  ui  =  £.  Its  energy  is 
concentrated  in  a  neighborhood  of  £,  whose  size  is  proportional  to  1/s.  The 
dictionary  of  time-frequency  atoms  V  =  (57(f))7ep  is  a  very  redundant 

set  of  functions  in  L2(R)  that  includes  window  Fourier  frames  and  wavelet 
frames  [3].  Instead  of  expanding  the  signal  on  such  a  frame  that  is  chosen 
a  priori,  we  want  to  choose  within  V  the  time-frequency  atoms  that  are 
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best  adapted  to  expand  a  given  f(t).  A  first  issue  is  to  define  a  notion  of 
“optimal”  approximation  within  a  given  dictionary. 

Definition  1  Let  e  >  0.  An  e-approximation  of  f  £  H  is  a  linear  expansion 
of  dictionary  vectors 

f= 

0<n<M 

for  which 

ll/-/ll<*.  (3) 

/  is  an  optimal  e-approximation  if  M  is  the  minimum  integer  for  which  (3) 
is  satisfied. 

To  determine  the  computational  complexity  of  obtaining  these  optimal 
solutions,  we  consider  a  space  H  of  finite  dimension  and  dictionaries  V  of  size 
0(Nk)  for  some  k  >  0.  We  encode  all  quantities  with  0(NP )  bits  for  some 
positive  p.  We  say  that  an  algorithm  solves  the  optimal  e- approximation 
problem  if,  for  any  given  /  £  H,  any  V  of  size  0 ( A  k ) ,  and  any  e  >  0.  we 
can  find  an  optimal  e- approximation  for  /.  The  following  result  shows  that 
this  problem  is  computationally  intractable. 

Theorem  1  The  optimal  e-approximation  problem  is  NP-hard. 

In  large  dimensional  spaces,  it  is  therefore  not  feasible  to  compute  opti¬ 
mal  e- approximations.  Theorem  1  is  proved  by  showing  that  any  instance 
of  the  Exact  Cover  by  3- Sets  problem  [6]  can  be  transformed  in  polynomial 
time  into  an  optimal  e-approximation  problem.  Thus,  an  algorithm  which 
solves  the  e-approximation  problem  can  solve  the  NP-complete  Exact  Cover 
by  3-Sets  problem.  The  intractability  of  the  approximation  problem  is  due 
to  the  coupling  between  terms  in  the  function  expansions  when  the  dictio¬ 
nary  elements  are  not  orthogonal.  We  map  the  overlapping  sets  in  the  Ex¬ 
act  Cover  by  3- Sets  problem  to  a  set  of  coupled,  non- orthogonal  dictionary 
vectors.  When  the  dictionary  vectors  are  orthogonal,  this  coupling-induced 
complexity  vanishes.  We  can  solve  the  problem  in  0(  N  log  A)  by  sorting  the 
inner  products  {|  <  /,  g^  >  |2}7gr  and  finding  the  minimum  M  for  which 
the  sum  of  the  M  largest  terms  satisfies  ||/||2  —  J2t=i  I  <  /■  g-:>  >  |2  <  e. 

In  addition  to  the  computational  complexity,  an  important  issue  is  that 
this  optimization  criteria  can  lead  to  numerically  unstable  expansions.  We 
can  construct  examples  where  the  l2  norm  of  the  expansion  coefficients  is 
arbitrarily  larger  than  ||/]|2,  i.e. 

£  IA.I2  »  ll/ll2.  (4) 

0  <n<M 
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This  instability  can  be  avoided  by  imposing  the  constraint  that  the  approx¬ 
imation  must  satisfy 

E  l^|2<  A'|]/||2,  (5) 

0  <n<M 

for  some  fixed  K  >  1.  One  can  prove  that  there  always  exist  such  e- 
approximations . 

When  c  is  modified,  the  dictionary  vectors  that  appear  in  the  optimal 
approximation  can  change  completely,  which  prevents  us  from  computing 
optimal  approximations  using  progressive  refinement.  This  instability  of 
the  optimal  approximations,  together  with  the  computational  intractibil- 
ity  of  computing  them,  leads  us  to  use  greedy  sub-optimal  algorithms  that 
progressively  refine  the  functional  approximation  by  choosing  appropriate 
dictionary  vectors. 

3  Matching  Pursuit 

Let  /  G  H.  We  want  to  compute  a  linear  expansion  of  /  over  a  set  of 
vectors  selected  from  V  which  best  matches  the  inner  structures  of  /.  A 
matching  pursuit  is  a  greedy  algorithm  which  successively  approximates  / 
with  orthogonal  projections  onto  elements  of  V.  Let  gl0  E  V.  The  vector  / 
can  be  decomposed  into 


/  =<  f,910  >  £70  +  Rf,  (6) 

where  Rf  is  the  residual  vector  after  approximating  /  in  the  direction  of 
gl0.  Clearly  gl0  is  orthogonal  to  Rf ,  hence 

ll/ll2  =  l<  /,ff7o  >P  +  \\Rf\\2-  (7) 

To  minimize  ||_R/||,  we  must  choose  g7o  E  V  such  that  |  <  f,g10  >  |  is 
maximal.  In  some  cases,  it  is  only  possible  to  find  a  vector  gl0  that  is  close 
to  the  maximum  in  the  sense  that 

|<  f,910  >|  >  « sup  |<  f,g1  >|,  (8) 

7gr 

where  a  E  (0, 1]  is  an  optimality  factor. 

We  sub-decompose  the  residue  Rf  by  projecting  it  onto  the  vector  of  V 
that  best  matches  Rf,  as  was  done  for  /.  This  projection  of  Rf  generates 
a  second  residue,  R2 /,  which  we  again  decompose  to  obtain  a  third  residue, 
and  so  on. 
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Let  us  explain  by  induction  how  the  matching  pursuit  is  carried  further. 
Let  R° f  =  /.  We  suppose  that  we  have  computed  the  nth  order  residue 
Rnf,  for  it  >  0.  We  choose  with  the  choice  function  C  an  element  gln  £  V 
which  closely  matches  the  residue  Rn f 

|  <  Rnf,gln  >  |  >  a  sup  |  <  Rnf,g1  >  |.  (9) 

7er 

The  residue  Rn  f  is  sub-decomposed  into 

Rnf=<  Rnf,gln  >g^  +  Rn+1f,  (10) 

which  dehnes  the  residue  at  the  order  n+1.  Since  Rn+1  f  is  orthogonal  to 
gln ,  we  have 

\\Rnf\2  =  |  <  Rnf,gyn  >  |2  +  \\Rn+1f\\2.  (11) 

Let  us  carry  this  decomposition  up  to  the  order  m.  We  decompose  /  into 
the  telescoping  sum 

m— 1 

/  =  E  ( Rnf  -  Rn+1f )  +  Rmf-  (12) 

n= 0 

Equation  (10)  yields 

m  —  1 

/=  E  <  R':f-!H.  >  +  «"'/•  (13) 

n= 0 

Similarly,  we  write  ||/||2  as  a  telescoping  sum 

m— 1 

l/l2  =  E  (ll^/I2  -  ll^n+1/ll2)  +  II ir/f  (i4) 

n= 0 

which  we  combine  with  (11)  to  obtain  an  energy  conservation  equation 

m  —  1 

l/ll3  =  £  I <Rnf,g1„  >|2  +  Ifl”7||2.  (15) 

n= 0 

Thus,  the  original  vector  /  is  decomposed  into  a  sum  of  dictionary  elements 
which  are  chosen  to  best  match  its  residues.  Although  this  decomposition  is 
non-linear,  we  maintain  an  energy  conservation  as  though  it  were  a  linear, 
orthogonal  decomposition.  An  important  issue  is  to  understand  the  behavior 
of  the  residue  Rmf  when  m  increases.  By  transposing  a  result  proved  by 
Jones  [9]  for  projection  pursuit  algorithms  [5],  one  can  prove  [10]  that  the 
matching  pursuit  algorithm  converges,  even  in  infinite  dimensional  spaces. 
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Theorem  2  Let  /  £  H.  The  residue  Rmf  defined  by  the  induction  equation 
(10)  satisfies 


lim  \\Rmf\\=0. 

oo 


(16) 


Hence 


+  00 


/  =  E  <  Rnfi9-m  > 

n= 0 


(17) 


and 


+  00 

£!<£”/,  <?7n 


n=0 


(18) 


117;  en  H  is  of  finite  dimension,  ||JRm/||  decays  exponentially  to  zero. 


This  theorem  proves  that  any  vector  /  is  characterized  by  the  double 
sequence  (<  Rnf,g-y„  >57n)neN;  called  a  structure  book,  which  specifies 
the  expansion  coefficients  and  the  index  of  each  chosen  vector  within  the 
dictionary. 


4  Back-projection  and  Orthogonal  Pursuit 

After  m  iterations,  a  matching  pursuit  decomposes  a  signal  /  into 

m  —  1 

/=  E  <Rnf,9ln>9ln  +  Rmf.  (19) 

n= 0 

If  we  stop  the  algorithm  at  this  stage  and  only  record  the  partial  structure 
book  (<  Rnf,gln  >,7n)0<n<m,  the  summation  of  equation  (19)  recovers  an 
approximation  of  /  with  error  Rm /.  However,  this  sum  is  not  the  linear 
expansion  of  the  vectors  (//'.,.  )o<«<«s.  which  best  approximates  /.  Let  Vm 
be  the  space  generated  by  (g7n)o<n<m  and  Pym  be  the  orthogonal  projector 
onto  Vm.  For  any  /  £  H,  Py  /  is  the  closest  vector  to  /  that  can  be  written 
as  linear  expansion  of  the  m  vectors  ( </7„)o<n<m-  We  derive  from  (19)  that 

m— 1 

PvJ  =  E  <  f-9~?n  >  9ln  +  P VmRmf-  (20) 

n= 0 

If  the  family  of  vectors  (^7n)o<n<m  is  not  orthogonal,  which  is  generally  the 
case,  then  Py  Rmf  /  0.  The  computation  of 

m— 1 

P VmRmf=  E  XnS'm,  (21) 

n= 0 
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is  called  a  back-projection.  Instead  of  storing  the  inner  products  <  Rnf,g-yn  > 
in  the  structure  book,  we  store  <  Rn  f,gln  >  t ./:«  in  order  to  recover  Py  f 
with  (20).  In  this  case,  the  approximation  error 

PwJ=/-PvJ  (22) 

is  the  orthogonal  projection  of  /  on  the  space  Wm,  the  orthogonal  comple¬ 
ment  of  Vm  in  H.  The  calculation  of  the  coefficients  (as„)o <n<m  requires 
that  we  solve  the  following  linear  system.  For  any  glk,  0  <  k  <  m, 

m—  1 

<  P  vmRmf,glk  >=<  Rmf,glk  >=E^<  9ln,9^k  >  •  (23) 

n= 0 

Let  us  denote  X  =  (xn)0<n<m  and  Y  =  (<  Rmf,glk  >)o<k<m-  Let  G  = 
(<  g-,n ,  g~.k  >)o<k<m,o<n<m  be  the  Gram  matrix  of  the  family  of  selected 
vectors.  The  linear  system  of  equations  (23)  can  be  written  Y  =  GX .  A 
solution  of  this  system  is  computed  efficiently  with  a  conjugate  gradient 
algorithm  [10].  If  H  is  of  finite  dimension  N,  there  are  many  classes  of 
dictionaries  for  which  any  collection  of  N  distinct  dictionary  vectors  is  a 
basis  of  H.  This  is  the  case  for  the  Gabor  dictionary  used  for  time- frequency 
decompositions.  Hence,  after  selecting  N  different  vectors  with  a  matching 
pursuit,  the  back-projection  reduces  to  0  the  remaining  residue. 

Instead  of  recovering  the  orthogonal  projection  Py  /  at  the  end  of  the 
matching  pursuit,  one  can  modify  the  pursuit  algorithm  by  computing  this 
orthogonal  projection  when  selecting  each  new  vector  of  the  dictionary.  It  is 
more  efficient  to  orthogonalize  the  family  of  selected  vectors  with  a  Gram- 
Schmidt  algorithm  than  to  perform  a  back  projection.  This  type  of  algo¬ 
rithm  was  first  introduced  for  control  applications  [1]  and  also  also  studied 
independently  from  this  work  by  Pati  et  al.  [11],  It  has  the  advantage  of 
providing  better  approximations  than  the  matching  pursuit  algorithm,  but 
it  requires  much  more  computation  and  can  introduce  numerical  instabilities 
into  the  expansions.  We  describe  by  induction  this  orthogonal  pursuit. 

For  n  —  0,  we  set  R° f  =  f.  Like  in  a  matching  pursuit,  we  define  a 
supremum  factor  a,  with  0  <  a  <  1,  and  choose  glQ  G  V  which  satisfies 

l<  f,9i6  >1  >  « sup  |<  f,gn  >|.  (24) 

7er 

The  space  Vi  is  generated  by  the  single  vector  glQ.  The  first  vector  uo  of 
the  Gram- Schmidt  basis  is  gl0.  The  next  residue  is  defined  by 

Rf  =  f-’Pv1f  =  f-<f,9%>9'*.  (25) 
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Let  us  explain  by  induction  how  to  compute  the  orthogonal  residue 
Rn+1  f  from  Rnf.  We  suppose  that  we  have  already  selected  n  vectors 
(g-yp)o<p<n  that  are  linearly  independent  and  that  we  computed  the  corre¬ 
sponding  Gram-Schmidt  orthogonal  basis  (up) o<p<n-  Both  ( 9-yp)o<p<n  and 
(uP)o<p<n  are  bases  of  the  space  Vn  and 

Rnf  =  f-PYJ.  (26) 

We  choose  a  vector  gln  £  V  which  satisfies 


|<  Rn  f,  gln  >|  >  a  sup  |<  Rnf,g1  >|.  (27) 

7er 

If  <  Rnf,g-yn  > /  0,  then  the  vector  gln  cannot  belong  to  the  space  V,, 
since  Rnf  is  orthogonal  to  Vn.  Hence  the  vectors  (g-yp)o<p<n  are  linearly 
independent  .  The  next  vector  un  of  the  Gram-Schmidt  basis  is  obtained  by 
subtracting  from  gln  its  projection  on  the  space  V„ 


un  —  9-y  n 


n— 1 

E 

p=0 


<  glnTup  > 


(28) 


The  family  (up) o<p<n  is  an  orthogonal  basis  of  Vn+i.  The  residue  Rn+1  f  is 
defined  by 

'■  .  r 

ig-+i/  =  /-Fv„t,/  =  /-E  i,;7;  v  (29) 

p= o  1 1 “pm 

This  can  also  be  rewritten 

Rn+lf=Rnf_  <Ry\ln>Un.  (30) 

Since  Rn /  is  orthogonal  to  the  vectors  (glp) o<P<m  equation  (28)  imphes 
that  <  Rnf,  un  >  =  <  Rnf,gln  >  and  thus 

Rn+1f  =  Rn  f  -  <  >un ■  (31) 

I  l^n|  | 

This  equation  is  similar  to  the  residue  updating  equation  (10)  of  a  match¬ 
ing  pursuit,  but  instead  of  subtracting  a  vector  in  the  direction  of  gln,  we 
subtract  a  component  in  a  direction  orthogonal  to  all  vectors  previously 
selected.  Since  Rn+1  f  and  un  are  orthogonal, 


<Rnf,g > 


\\Rn+1f\\2  =  \\Rnf\\2 


2 


2 


(32) 


An  orthogonal  pursuit  guarantees  that  the  selected  vectors  {g-yn)o<n<m 
are  linearly  independent,  and  computes  the  best  possible  approximation  of 
/  from  these  vectors.  Since  R° f  =  /,  we  derive  from  equations  (31)  and 
(32)  that  for  any  m  >  0 

/=  E  <R\\f,U2n  >^n  +  Rmf,  (33) 

0<ri<m 

and 

i/i2  =  E  <  R7j%' >  |;  +  \Rmn2-  <34) 

The  derivations  are  similar  to  those  for  equations  (13)  and  (14).  The  next 
theorem  is  similar  to  Theorem  2  and  guarantees  the  convergence  of  the 
orthogonal  pursuit  [4]. 


Theorem  3  Let  /  £  H.  Let  N  be  the  dimension  of  H  (N  may  be  infinite). 
The  orthogonal  matching  pursuit  converges  in  M  <  N  iterations  (M  may 
be  infinite  if  N  is  infinite).  The  residue  Rnf  defined  inductively  by  equation 
(31)  satisfies 


lim  \\Rnf\\  =  0, 

n^M 


(35) 


/ 


E 

0  <n<M 


<  Rnf,g~/n  > 


(36) 


and 


E 

0  <.n<M 


<  Rnf,gln  > 


(37) 


If  H  is  of  finite  dimension,  the  orthogonal  pursuit  converges  within  a  finite 
number  of  iterations. 


Our  primary  objective  is  not  to  expand  /  over  (un)o<n<M  but  rather 
over  (g-yn)o<n<M-  We  want  coefficients  (/ 3n)o<n<M  such  that 

/=  E  /W  (38) 

0  <n<M 

Since  un  G  V„  and  (glp)o<p<n  is  a  basis  of  V„,  we  can  decompose  un  into 

n 

H n  =  ^  ^  ^P,n9'yp'  (39) 

p= 0 
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The  coefficients  bfj:n  can  be  calculated  while  computing  the  orthogonal  match¬ 
ing  pursuit,  as  explained  in  section  5.  Inserting  expression  (39)  into  (36) 
yields 

<  Rn f,gln  >  y-'  i.  (a n\ 

/=  - M7-M2 -  (40) 

0  <n<M  1 1  “nil  p=0 

One  could  naively  try  to  rearrange  the  terms  of  this  double  summation  to 
obtain 

f  —  k  ^  f 1  9'<n  '>  /  i  -i  \ 

/  —  dip  1  n  ||  1 1 2  ■  (41) 

0  <p<M  p<n<lM  II  n  1 1 

However,  when  M  =  +oo  the  infinite  sum  over  n  that  defines  each  coefficient 
(ip  may  not  converge.  Such  a  situation  arises  when  the  family  (gln)o<n<M 
is  not  a  Riesz  basis  of  the  closed  space  Vj if  that  it  generates.  For  such  a 
case,  we  cannot  obtain  an  expansion  of  the  form  (38)  from  the  orthogonal 
matching  pursuit.  If  the  signal  space  H  has  a  finite  dimension  N,  then  M 
is  finite,  so  we  can  always  invert  the  two  sums  of  (40)  to  obtain  (41).  The 
basis  (g-yn)o<n<M  may,  however,  be  very  badly  conditioned  in  which  case  we 
can  have  numerical  instabilities: 

£  l/JJ2  »  ll/ll2-  (42) 

0  <n<M 

The  residues  of  orthogonal  matching  pursuits  in  general  decrease  faster 
than  the  residues  of  non- orthogonal  matching  pursuits.  However,  this  or¬ 
thogonal  procedure  can  yield  unstable  expansions  and  requires  much  more 
numerical  computation  because  of  the  Gram-Schmidt  orthogonalization. 
The  implementation  and  computational  complexity  of  these  two  algorithms 
is  compared  in  the  next  section. 

5  Numerical  Implementations  of  Matching  Pur¬ 
suits 

We  describe  fast  implementations  of  non-orthogonal  and  orthogonal  match¬ 
ing  pursuits  in  finite  dimensional  spaces,  and  compare  their  performance. 
Software  implementing  matching  pursuits  for  time-frequency  dictionaries  is 
available  through  anonymous  ftp  at  the  address  cs.nyu.edu  .  Instructions 
are  in  the  hie  README  of  the  directory  /pub/wave/software. 

At  stage  n  of  the  pursuit,  we  suppose  that  the  inner  products  (<  Rnf,9l> 
)7ep  have  already  been  computed.  We  choose  the  optimality  factor  a  =  1, 
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so  that  the  first  step  is  to  find  gln  such  that 

I  <  Rnf,9~,n  >  |  =  sup  |  <  Rnf,g1  >  |. 

7er 

If  all  the  inner  products  are  stored  in  an  open  hash  table,  finding  this  supre- 
mum  requires  on  average  0(1)  operations.  Otherwise,  one  needs  to  search 
across  the  whole  set  of  inner  products. 

For  a  non-orthogonal  matching  pursuit,  once  the  vector  gln  is  selected, 
we  compute  the  inner  product  of  the  new  residue  Rn+1  f  with  all  </7  £  V, 
with  an  updating  formula  derived  from  equation  (10) 

<  Rn+1f,g'1  >=<  Rnf,g-y  >  -  <  >  <  g^g-r  >  •  (43) 


Since  we  previously  stored  <  Rnf,g-y  >  and  <  Rnf,gln  >,  this  update  re¬ 
quires  only  the  computation  of  <  gln,g-y  >.  Dictionaries  are  generally  built 
so  that  few  of  these  inner  products  are  non-zero,  and  non-zero  inner  products 
are  computed  with  a  small  number  of  operations.  Let  us  suppose  that  such 
inner  product  computations  are  done  in  0(1)  operations  and  that  there  are 
O(Z)  non-zero  inner  products  for  any  g7n.  Computing  {<  Rn+1f,g7  >}7gp 
thus  requires  0(IZ )  operations.  Hence,  the  total  numerical  complexity  of 
computing  P  matching  pursuit  iterations  is  O(PIZ).  An  efficient  implemen¬ 
tation  of  a  discrete  Gabor  dictionary  [10]  has  1=1  and  Z  =  N,  so  that  P 
iterations  require  O(NP)  operations.  Since  the  dictionary  has  0(Nlog  N) 
vectors,  the  total  memory  needed  for  the  algorithm  is  0(N  log  N). 

For  an  orthogonal  matching  pursuit  algorithm,  once  the  vector  gln  is 
selected,  we  must  compute  the  orthogonal  vector  un  with  the  Gram-Schmidt 
formula  (28).  We  suppose  that  for  p  <  n,  we  have  already  computed  the 
expansion  coefficient  of  each  up  in  (glk)o<k<p, 


u„ 


From 


Ur, 


we  can  compute  the  expansion 


(44) 

k= 0 

n— 1  .  . 

<  g~tn  5  up  >  „ 

2^  ||„  1 12 
p= o  II  “pH 

(45) 

n 

:  E  ^k,nglk- 

k= 0 

(46) 
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If  the  inner  product  of  any  two  elements  in  V  is  calculated  in  0(1 )  oper¬ 
ations,  the  (bk,n)o<k<n  are  obtained  in  0(nl  +  n2)  operations.  We  then 
compute  the  inner  product  of  the  new  residue  Rn+1  f  with  any  g 7  G  P,  from 
the  orthogonal  updating  formula  (31) 

<  Rn+1f,g1  >=<  Rnf,g1  >  -  <  Rnf,gln  >  <  un,gi  >  .  (47) 

Since 

n 

9^  ^ ^  1 9^i  ^  ?  (48) 

k= 0 

computing  (<  Rn+1f,g7  >)7^r  requires  0  (nIZ)  operations.  The  total  num¬ 
ber  of  operations  to  compute  P  orthogonal  matching  pursuit  iterations  is 
therefore  0(P3  +  P2IZ )  operations.  For  a  dictionary  of  discrete  Gabor 
signals  with  signals  of  size  N,  since  /  =  1,  Z  =  N,  and  P  <  N,  the  num¬ 
ber  of  operations  is  0(NP2).  It  also  requires  0(NlogN  +  P2)  memory  to 
store  the  inner  products  (<  Rnf,g1  >)7gr  and  the  expansion  coefficients 

(bk  ,p)o<k,p<.n  ■ 

For  P  iterations,  the  non-orthogonal  matching  pursuit  algorithm  is  P 
times  faster  than  the  orthogonal  one.  When  P  is  large,  which  is  the  case 
in  many  signal  processing  applications,  the  orthogonal  pursuit  algorithm  is 
much  slower  than  the  non-orthogonal  one,  and  requires  much  more  memory. 
When  P  remains  small,  the  orthogonal  pursuit  is  more  advantageous  because 
it  converges  faster. 

6  Matching  Pursuit  With  Time- Frequency  Dic¬ 
tionaries 

For  dictionaries  of  time- frequency  atoms,  a  matching  pursuit  yields  an  adap¬ 
tive  time- frequency  transform.  It  decomposes  any  function  f(t )  E  L2(R) 
into  the  sum  of  complex  time-frequency  atoms  that  best  match  its  residues. 
This  section  describes  the  properties  of  this  particular  matching  pursuit  de¬ 
composition.  We  derive  a  new  type  of  time-frequency  energy  distribution 
by  summing  the  Wigner  distributions  of  the  time- frequency  atoms. 

Since  a  time-frequency  atom  dictionary  is  complete,  Theorem  2  proves 
that  a  matching  pursuit  decomposes  any  function  /  E  L2(R)  into 

+oo 

/  =E<^/,57,  >  9ln,  (49) 

n= 0 
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where  7„  =  (s„,u„,£„)  and 

9m{t)  =  (so) 

VSn  Sn 

These  atoms  are  chosen  to  best  match  the  residues  of  /. 

We  derive  a  new  time- frequency  energy  distribution  from  the  decompo¬ 
sition  of  any  f(t)  within  a  time- frequency  dictionary,  by  adding  the  Wigner 
distribution  of  each  selected  atom.  Recall  that  the  cross  Wigner  distribution 
of  two  functions  /(/)  and  h(t)  is  defined  by 

W[f,  J  f(t  +  T-)h{t  -  T-)e~ZWTdT.  (51) 

The  Wigner  distribution  of  f{t)  is  Wf{t,u )  =  W[f,f]{t,u>).  Since  the 
Wigner  distribution  is  quadratic,  we  derive  from  the  atomic  decomposition 
(49)  of  fit)  that 

+  00 

Wf(t,u)  =  ^2  \<  Rnf,gln  >\2Wgln{t,u>)  (52) 

n= 0 

+oo  +oo 

+  J2  J2  <  Rnf,gln  >  <  Rmf,9-ym  >W[g-yn,gyJ(t,oj). 

n= 0  m=0,m^n 

The  double  sum  corresponds  to  the  cross  terms  of  the  Wigner  distribution. 
It  contains  the  terms  that  one  usually  tries  to  remove  in  order  to  obtain  a 
clear  picture  of  the  energy  distribution  of  /( t)  in  the  time- frequency  plane. 
We  therefore  keep  only  the  first  sum  and  define 

+oo 

Ef(t,oj)  =  Y.\<Rnf>9ln  >\2Wgln(t,u;).  (53) 

n= 0 

A  similar  decomposition  algorithm  over  time- frequency  atoms  was  derived 
independently  by  Qian  and  Chen  [12]  in  order  to  define  this  energy  dis¬ 
tribution  in  the  time-frequency  plane.  From  the  well  known  dilation  and 
translation  properties  of  the  Wigner  distribution  and  the  expression  (50)  of 
a  time- frequency  atom,  we  derive  that  for  7  =  (s,  u) 

W9l(t,u>)  =  WgC^s{u  -  0),  (54) 

and  hence 

+  OO  , 

Efit^)=  £|<  Rnf,gln  >| *Wg(—^,sn(L0-tn)).  (55) 

n—n  Sn 
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The  Wigner  distribution  also  satisfies 


Wg(t,  io)dt  duo  =  \\g\\2  =  1, 


so  the  energy  conservation  equation  ( 18)  implies 


+oo  /*+ CO 


/-too  r 

-oo  J  — 


Ef(t,u>)dt  duo  =  | /|p 


(56) 


(57) 


We  can  thus  interpret  Ef(t,uo)  as  an  energy  density  of  /  in  the  time- 
frequency  plane  (f,u>).  Unlike  the  Wigner  and  the  Cohen  class  distributions, 
it  does  not  include  cross  terms. 

If  g(t)  is  the  Gaussian  window 

//(/)  .  ..  2'/'V  (58) 


then 

Wg(t,uo)  =  2e-M*2+(§d2),  (59) 

so  Ef(t,  uo )  remains  positive.  The  time- frequency  atoms  (/,,(/)  are  then  called 
Gabor  functions.  The  time- frequency  energy  distribution  Ef(t,uo)  is  a  sum 
of  Gaussian  blobs  whose  locations  and  variances  along  the  time  and  fre¬ 
quency  axes  depend  upon  the  parameters  (sn,  «„,£„). 

Fig.  1(a)  is  a  signal  /  of  512  samples  that  is  built  by  adding  chirps, 
truncated  sinusoidal  waves  and  waveforms  of  different  time-frequency  lo¬ 
calizations.  No  Gabor  functions  have  been  used  to  construct  this  signal. 
Fig.  1(b)  shows  the  time- frequency  energy  distribution  Ef(t,uo).  Since 
Ef(t,uo )  =  E/(f,  — u>),  we  only  display  its  values  for  uo  >  0.  Each  Gabor 
time- frequency  atom  selected  by  the  matching  pursuit  is  an  elongated  Gaus¬ 
sian  blob  in  the  time-frequency  plane.  We  clearly  see  appearing  two  chirps 
that  cross  each  other,  with  a  localized  time- frequency  waveform  on  the  top 
of  their  crossing  point.  We  can  also  detect  closely  spaced  Diracs  and  trun¬ 
cated  sinusoidal  waves  having  close  frequencies.  Several  isolated  localized 
time-frequency  components  also  appear  in  this  energy  distribution. 

Fig.  2(a)  is  the  graph  of  a  speech  recording  corresponding  to  the  word 
“greasy”,  sampled  at  16  kHz.  From  the  time- frequency  energy  displayed  in 
Fig.  2(b),  we  can  see  the  low-frequency  component  of  the  “g”  and  the  quick 
burst  transition  to  the  “ea”.  The  “ea”  has  many  harmonics  that  are  lined 
up  but  we  can  also  see  localized  high-frequency  impulses  that  correspond 
to  the  pitch.  The  “s”  component  has  a  time- frequency  energy  spread  over 
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a  high-frequency  interval.  Most  of  the  signal  energy  is  characterized  by  a 

||  D  n  f  || 

few  time-frequency  atoms.  For  n  =  250  atoms,  |jjj|  =  -169,  although  the 
signal  has  5782  samples,  and  the  sound  recovered  from  these  atoms  is  of 
excellent  quality. 

Let  us  now  compare  the  decay  rate  of  the  residues  for  non-orthogonal 
matching  pursuits  versus  orthogonal  matching  pursuits.  The  top  curve  in 
Fig.  3(a)  gives  the  decay  of  log10  as  a  function  of  the  number  of 

iterations  n,  for  a  non-orthogonal  matching  pursuit.  For  n  >  130,  the  decay 
rate  is  almost  constant.  This  confirms  the  exponential  decay  proved  by 
Theorem  2.  The  bottom  curve  in  Fig.  3(a)  gives  the  decay  of  log10 
as  a  function  of  the  number  of  iterations  n,  for  an  orthogonal  matching 
pursuit.  For  the  first  180  iterations  we  cannot  distinguish  the  two  curves, 
which  means  that  the  decay  of  the  residues  for  the  two  algorithms  is  nearly 
identical.  This  indicates  that  the  atoms  selected  by  the  non-orthogonal 
pursuit  are  nearly  orthogonal.  After  this  point,  we  see  that  the  residues  of 
the  orthogonal  pursuit  decay  much  faster  and  for  n  —  N  =  512,  the  number 
of  samples  of  the  signal,  the  residue  is  zero.  The  next  section  analyzes  more 
precisely  this  divergence  of  the  convergence  rates  of  the  orthogonal  and  non- 
orthogonal  pursuits.  The  point  of  separation  of  the  rates  corresponds  to  a 
stage  at  which  the  residues  have  no  more  “coherent”  structures.  At  this 
stage,  the  residue  Rnf  has  statistical  properties  that  are  very  close  to  a 
the  realization  of  a  stationary  white  noise.  For  many  information  processing 
applications,  we  stop  the  decomposition  when  these  coherent  structures  have 
disappeared,  so  for  such  applications  the  orthogonal  pursuit  does  not  offer 
much  advantage. 

7  Chaos  in  Matching  Pursuit  and  Noise  Removal 

A  non-orthogonal  matching  pursuit  can  require  an  infinite  number  of  iter¬ 
ations  to  converge,  even  in  finite  dimensions.  We  know  already  that  the 
norms  of  these  residues  converge  to  zero;  we  now  examine  in  more  detail  the 
asymptotic  behavior  of  the  residues.  Numerical  experiments  suggest  that 
matching  pursuits  are  chaotic  maps,  and  we  can  prove  this  for  a  particu¬ 
lar  dictionary.  Experiments  also  show  that  these  matching  pursuits  possess 
invariant  measures,  and  we  use  this  property  to  develop  a  noise  removal 
algorithm. 

To  study  the  asymptotic  properties  of  the  residues,  we  first  renormalize 
them  to  prevent  their  convergence  to  zero.  We  define  the  renormalized 
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Figure  1(b):  Time- frequency  energy  distribution  Ef(t,u )  of  the  signal 
shown  in  (a).  The  horizontal  axis  is  time.  The  vertical  axis  is  frequency. 
The  highest  frequencies  are  on  the  top.  The  darkness  of  this  time- frequency 
image  increases  with  the  value  Ef(t,u>). 
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Figure  2(b):  time- frequency  energy  distribution  of  the  speech  recording 
shown  in  (a).  We  see  the  low-frequency  component  of  the  “g”,  the  quick 
burst  transition  to  the  “ea”  and  the  harmonics  of  the  “ea”.  The  “s”  has 
energy  spread  over  high  frequencies. 
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residue 


5»f_  RUf 

J  \\RnfW 

Renormalized  matching  pursuit  is  the  map  defined  by 

(60) 

M(Rnf )  =  Rn+1f. 

(61) 

Since 

\\Rn+1f\\2  =  \\Rnff- \<Rnf,9^>\\ 

(62) 

we  obtain  that 

M (Rn  f  )  —  Rnf~  ^  Rn ^  Sin 
\/l  -  1  <  Rnf,9in  >  I2 

(63) 

At  each  iteration  the  renormalized  matching  pursuit  map  removes  the 
largest  dictionary  component  of  the  residue  and  renormalizes  it.  This  action 
is  much  like  that  of  a  left-shift  operator  acting  on  a  base- IV  decimal  num¬ 
ber:  the  shift  operator  removes  the  most  significant  (leftmost)  digit  of  the 
expansion  and  then  “renormalizes”  the  expansion  by  multiplying  by  N.  Let 
Sjv  be  the  set  of  all  base  N  decimals.  The  left-shift  map  Xjv  :  £ n  £jv  is 
formally  defined  by 

LN{O.S1S2S3  .  .  .)  =  0..S2S3-S4 -  (64) 

where  O.S1S2  •  •  •  is  the  base-JV  decimal  jy*  •  The  shift  map  is  known 

to  be  a  chaotic  map,  which  suggests  that  renormalized  matching  pursuits 
share  this  property. 

Additional  evidence  that  renormalized  matching  pursuits  are  chaotic  can 
be  found  in  the  fact  that  the  map  has  “sensitive  dependence”  on  the  initial 
signal  /  when  /  is  close  to  a  dictionary  element.  Consider  two  signals  /1 
and  /2  defined  by 

/1  =  (1  -  e)g  +  eh!  (65) 

and 

f2  =  (1  -  e)g  +  eh2  (66) 

where  g  is  the  closest  dictionary  element  to  /1  and  f2,  ||/ii||  =  || h2 1|  =  1, 
and  <  hi,g  >  —  <  h2,g  >=  0.  Then  ||/i  —  /2J  =  e\hi  —  h2\  can  be  made 
arbitrarily  small,  while  \Rfi  —  -R/2I  =  ||hi  —  h2  \  is  of  order  1.  The  map  thus 
separates  points  near  dictionary  elements. 

We  examined  the  renormalized  matching  pursuit  map  for  a  dictionary 
in  3  dimensions  [4].  Through  symmetry  operations  this  map  can  be  reduced 
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to  a  1-dimensional  map,  which  we  prove  is  topologically  equivalent  to  a  left- 
shift  map.  Thus,  renormalized  matching  pursuit  in  this  case  is  a  chaotic 
map  whose  properties  are  completely  understood. 

Numerical  experiments  with  several  types  of  dictionaries,  including  Ga¬ 
bor  dictionaries,  provide  evidence  that  the  renormalized  matching  pursuit 
map  possesses  an  invariant  measure  and  that  it  is  mixing.  The  mixing 
property  means  that  if  we  start  with  a  collection  of  test  signals,  then  af¬ 
ter  sufficient  iterations  of  renormalized  matching  pursuit,  the  density  of  the 
residues  in  the  signal  space  will  be  close  to  the  invariant  density  function.  To 
express  this  result  in  another  way,  consider  a  random  process  which  yields 
signals  with  a  probability  measure  on  the  signal  space  given  by  our  invariant 
measure.  Then  the  mixing  property  implies  that  after  sufficient  iterations 
of  the  map,  the  residues  will  look  like  realizations  of  this  process. 

The  renormalized  matching  pursuit  map  continually  sets  to  zero  the 
largest  of  the  dictionary  components  of  the  current  residue,  and  redistributes 
the  energy  from  this  removed  component  over  the  remainder  of  the  residue. 
We  expect  this  system  to  be  near  equilibrium  when  all  dictionary  compo¬ 
nents  of  the  residue  have  roughly  the  same  magnitude— when  this  is  so,  the 
operations  of  setting  the  largest  component  to  zero  and  of  redistributing  the 
energy  have  little  effect.  The  residues  converge  to  realizations  of  a  random 
process  for  which  the  distribution  of  the  dictionary  components  is  flat.  This 
invariant  process  can  be  interpreted  as  a  generic  noise  with  respect  to  our 
dictionary,  which  we  call  dictionary  noise. 

For  a  Gabor  dictionary  we  observe  that  after  several  iterations  the  residues 
have  statistical  properties  that  are  close  to  realizations  of  a  white  station¬ 
ary  process.  To  better  understand  this  phenomenon,  we  first  note  that  the 
Gabor  dictionary  is  invariant  under  translation  and  frequency  modulation, 
which  means  that  for  any  g1[t)  E  T> ,  and  (u,  £)  E  R2,  there  exists  a  <j)  E  R 
such  that  e^S&g^i  —  u)  E  V.  One  can  prove  [4]  that  for  a  translation 
and  modulation  invariant  dictionary,  if  there  exists  an  invariant  measure, 
then  this  measure  is  invariant  with  respect  to  operators  that  translate  sig¬ 
nals  in  time  or  frequency.  This  invariant  measure  therefore  corresponds  to  a 
white  stationary  process.  A  detailed  analysis  of  the  invariant  measure  was 
performed  performed  for  a  dictionary  composed  of  a  discrete  Dirac  basis 
plus  a  discrete  Fourier  basis.  This  dictionary  is  clearly  invariant  by  trans¬ 
lations  and  frequency  modulations.  We  constructed  a  stochastic  differential 
equation  model  of  the  evolution  of  the  renormalized  residues  and  solved 
the  corresponding  Fokker-Planck  equation  to  compute  the  properties  of  the 
invariant  measure.  We  obtained  excellent  agreement  with  numerical  data 
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[4]- 

An  important  measure  of  the  flatness  of  the  signal  /  with  respect  to  a 
dictionary  is  the  correlation  ratio 

A(/)  =  sup  ^  >^.  (67) 

7er  I/I 

The  correlation  ratio  is  the  fraction  of  the  signal  energy  which  will  be  re¬ 
moved  by  one  iteration  of  a  matching  pursuit  when  the  optimality  ratio 
a  —  1.  Signals  which  possess  structures  closely  resembling  dictionary  el¬ 
ements  will  have  large  values  of  A (/).  As  the  matching  pursuit  proceeds, 
these  structures  are  removed,  and  A (Rk  f)  decreases.  When  signal  residues 
are  as  flat  as  realization  of  a  dictionary  noise,  the  convergence  rate  of  the 
pursuit  is  near  a  minimum.  It  is  often  not  worth  continuing  the  matching 
pursuit  iterations  since  the  signal  includes  no  structures  that  strongly  corre¬ 
late  to  dictionary  elements.  Let  A^  be  the  expected  value  of  the  correlation 
ratio  for  the  dictionary  noise.  This  constant  is  computed  through  numerical 
experiments  over  many  iterations  of  the  map.  We  call  the  coherent  struc¬ 
tures  of  a  signal  /  the  first  m  vectors  {gln)o<n<m  that  correlate  with  the 
residue  better  than  the  average  correlation  ratio  Aw.  For  0  <  n  <  m  we 
have 

X(Rnf)  >  A^  (68) 

and  for  m  we  have 

X(Rmf  )  <  Xrcc.  (69) 

The  decay  rate  of  the  energy  of  the  residues  depends  only  upon  the 
correlation  ratio.  If  we  set  the  optimality  factor  a  =  1,  then  (11)  implies 
that 

l*“+1/l_n  A„„n,§.  ,7(n 

■pF7r-(  /”'  1 

Fig.  3(b)  displays  the  correlation  ratio  of  the  residues  A 2(Rnf)  for  the  signal 
in  Fig.  1(a).  After  180  iterations,  A  (Rnf)  fluctuates  around  the  mean  A^.  It 
is  at  that  same  point  that  the  decay  rate  of  the  orthogonal  pursuit  residues, 
shown  in  Fig.  3(a),  becomes  faster  than  the  decay  rate  of  the  non-orthogonal 
pursuit.  This  indicates  that  the  orthogonal  pursuit  converges  significantly 
faster  than  the  non-orthogonal  pursuit  only  when  the  residues  are  already 
close  to  the  realization  of  a  dictionary  noise.  For  information  processing 
applications,  it  is  often  not  useful  to  continue  the  decomposition  beyond 
this  point,  so  for  such  applications  the  orthogonal  pursuit  does  not  offer 
much  advantage  over  the  non-orthogonal  pursuit. 
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The  ability  to  hud  the  coherent  structures  of  a  signal  can  be  used  to 
remove  noise  from  signals.  In  this  approach,  the  noise  is  entirely  defined 
by  the  choice  of  the  dictionary  and  corresponds  to  the  invariant  measure  of 
the  matching  pursuit  map.  For  a  Gabor  dictionary,  the  dictionary  noise  is 
white  and  stationary,  but  this  is  not  necessarily  true  for  other  dictionaries. 
The  dictionary  must  be  chosen  so  that  its  elements  correlate  as  closely  as 
possible  the  signal  inner  structures,  but  so  that  they  avoid  high  correlations 
with  realizations  of  the  noise  to  be  removed.  During  the  matching  pursuit 
decomposition,  we  test  the  correlation  ratio  (68)  and  stop  when  condition 
(69)  is  true,  i.e.  when  we  have  selected  all  coherent  structures.  Fig.  4(a) 
shows  a  signal  obtained  by  adding  a  Gaussian  white  noise  to  the  speech 
recording  given  in  Fig.  2(a),  with  a  signal  to  noise  ratio  of  1.5  dB.  Fig. 
4(b)  is  the  time- frequency  energy  distribution  of  this  noisy  signal.  The 
white  noise  generates  time-frequency  atoms  spread  across  the  whole  time- 
frequency  plane,  but  we  can  still  distinguish  the  time-frequency  structures 
of  the  original  signal  because  their  energy  is  better  concentrated  in  this 
plane.  This  signal  contains  m  =  76  coherent  structures  which  are  displayed 
in  the  time-frequency  energy  distribution  of  Fig.  5(a).  Fig.  5(b)  is  the 
signal  reconstructed  from  these  coherent  time- frequency  atoms.  The  SNR 
of  the  reconstructed  signal  is  6.8  dB.  The  white  noise  has  been  removed  and 
the  recovered  signal  has  a  good  auditory  quality  because  the  main  time- 
frequency  structures  of  the  original  speech  signal  have  been  retained. 


8  Conclusion 

Matching  pursuits  provide  extremely  flexible  signal  representations  since  the 
choice  of  dictionaries  is  not  limited.  For  information  processing  or  compact 
signal  coding,  it  is  important  to  have  strategies  to  adapt  the  dictionary  to 
the  class  of  signal  that  is  decomposed.  Time-frequency  dictionaries  include 
vectors  that  are  spread  between  the  Fourier  and  Dirac  bases.  They  are 
regularly  distributed  on  the  unit  sphere  of  the  signal  space  and  are  thus  well 
adapted  to  decomposing  signals  for  which  we  have  little  prior  information. 
When  enough  prior  information  is  available,  one  can  adapt  the  dictionary 
to  the  probability  distribution  of  the  signal  class  within  the  signal  space  H. 
Learning  a  dictionary  is  equivalent  to  finding  the  important  inner  structures 
of  the  signals  that  are  decomposed.  Classical  algorithms  for  the  optimization 
of  code  books,  such  as  LBG  [7],  do  not  converge  to  satisfying  solutions  in 
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such  high  dimensional  spaces.  We  are  currently  studying  the  problem  of 
adapting  the  dictionary  to  specific  signal  properties. 
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Figure  3(a):  The  top  and  bottom  curves  gives  respectively  the  decay  of 
logio  for  a  non-orthogonal  and  an  orthogonal  matching  pursuit,  applied 
to  the  signal  in  Fig.  1(a). 


Figure  3(b):  A (Rnf)  as  a  function  of  the  number  of  iterations  n,  for  the 
signal  in  Fig.  1(a). 
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Figure  4(a):  signal  obtained  by  adding  a  Gaussian  white  noise  to  the  speech 
recording  shown  in  Fig. 2(a).  The  signal  to  noise  ratio  is  1.5db. 


Figure  4(b):  time-frequency  energy  distribution  of  the  noisy  speech  sig¬ 
nal.  The  energy  distribution  of  the  white  noise  is  spread  across  the  whole 
time-frequency  plane. 
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Figure  5(a):  time-frequency  energy  distribution  of  the  m  =  76  coherent 
structures  of  the  noisy  speech  signal  shown  in  Fig. 3(a). 


Figure  5(b):  time-frequency  energy  distribution  of  the  m  =  76  coherent 
structures  of  the  noisy  speech  signal  shown  in  Fig.  3.  (b):  signal  recon¬ 
structed  from  the  76  coherent  structures  shown  in  (a).  The  white  noise  has 
been  removed. 
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